University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Diamond Tiling: A Tiling Framework for Time-iterated Scientific Applications

نویسندگان

  • Daniel Orozco
  • Guang Gao
چکیده

This paper fully develops Diamond Tiling, a technique to partition the computations of stencil applications such as FDTD. The Diamond Tiling technique is the result of optimizing the amount of useful computations that can be executed when a region of memory is loaded to the local memory of a multiprocessor chip. Diamond Tiling contributes to the state of the art on time tiling techniques in that it merges the following characteristics: (1) it optimally reuses the amount of computations that can be executed per region of memory loaded, (2) this optimization for locality is done regardless of code structure, (stencil computations with any loop structure can be optimized), the data dependencies between the computations are used to partition the program instructions, (3) the program partitions (tiles) resulting from applying Diamond Tiling are fully parallel without the need to execute redundant computations (4) code generation is simple, and it can be easily incorporated in an optimizing compiler and (5) the technique presented here is applicable to N dimensional stencil computations. Experimental evidence to support our claims is gathered using FDTD, a commonly used stencil application running on the recently developed Cyclops-64 processor. The results obtained show that stencil applications using Diamond Tiling have a lower running time and total number of off-chip memory operations than other state of the art tiling techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Tile Reduction: an OpenMP Extension for Tile Aware Parallelization

Tiling is widely used by compilers and programmer to optimize scientific and engineering code for better performance. Many parallel programming languages support tile/tiling directly through first-class language constructs or library routines. However, the current OpenMP programming language is tile oblivious, although it is the de facto standard for writing parallel programs on shared memory s...

متن کامل

Drug Discovery Acceleration Using Digital Microfluidic Biochip Architecture and Computer-aided-design Flow

A Digital Microfluidic Biochip (DMFB) offers a promising platform for medical diagnostics, DNA sequencing, Polymerase Chain Reaction (PCR), and drug discovery and development. Conventional Drug discovery procedures require timely and costly manned experiments with a high degree of human errors with no guarantee of success. On the other hand, DMFB can be a great solution for miniaturization, int...

متن کامل

Dual Space Control of a Deployable Cable Driven Robot: Wave Based Approach

Known for their lower costs and numerous applications, cable robots are an attractive research field in robotic community. However, considering the fact that they require an accurate installation procedure and calibration routine, they have not yet found their true place in real-world applications. This paper aims to propose a new controller strategy that requires no meticulous calibration and ...

متن کامل

Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study

Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...

متن کامل

High Speed Delay-Locked Loop for Multiple Clock Phase Generation

In this paper, a high speed delay-locked loop (DLL) architecture ispresented which can be employed in high frequency applications. In order to design the new architecture, a new mixed structure is presented for phase detector (PD) and charge pump (CP) which canbe triggered by double edges of the input signals. In addition, the blind zone is removed due to the elimination of reset signal. Theref...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010